229 research outputs found

    Scheduling data flow program in xkaapi: A new affinity based Algorithm for Heterogeneous Architectures

    Get PDF
    Efficient implementations of parallel applications on heterogeneous hybrid architectures require a careful balance between computations and communications with accelerator devices. Even if most of the communication time can be overlapped by computations, it is essential to reduce the total volume of communicated data. The literature therefore abounds with ad-hoc methods to reach that balance, but that are architecture and application dependent. We propose here a generic mechanism to automatically optimize the scheduling between CPUs and GPUs, and compare two strategies within this mechanism: the classical Heterogeneous Earliest Finish Time (HEFT) algorithm and our new, parametrized, Distributed Affinity Dual Approximation algorithm (DADA), which consists in grouping the tasks by affinity before running a fast dual approximation. We ran experiments on a heterogeneous parallel machine with six CPU cores and eight NVIDIA Fermi GPUs. Three standard dense linear algebra kernels from the PLASMA library have been ported on top of the Xkaapi runtime. We report their performances. It results that HEFT and DADA perform well for various experimental conditions, but that DADA performs better for larger systems and number of GPUs, and, in most cases, generates much lower data transfers than HEFT to achieve the same performance

    Mixed precision bisection

    Get PDF
    We discuss the implementation of the bisection algorithm for the computation of the eigenvalues of symmetric tridiagonal matrices in a context of mixed precision arithmetic. This approach is motivated by the emergence of processors which carry out floating-point operations much faster in single precision than they do in double precision. Perturbation theory results are used to decide when to switch from single to double precision. Numerical examples are presente

    Track running shoes: a case report of the transition from classical spikes to "super spikes" in track running

    Get PDF
    Research on high-tech running shoes is increasing but few studies are available about the use of high-tech track spike shoes (super spikes), despite their growing popularity among running athletes. The aim of this case study was to investigate kinematics, kinetics, and plantar pressures of an Olympic running athlete using two different types of shoes, to provide an easy and replicable method to assess their influence on running biomechanics. The tested athlete performed six running trials, at the same speed, wearing a pair of normal spikes shoes (NSS) and a super spikes shoe (SSS), in random order. SSS increased contact time, vertical impact, and swing force (Effect Size 3.70, 7.86, and 1.31, respectively), while it reduced foot-strike type and vertical ground reaction force rate (Effect Size 3.62 and 7.21, respectively). Moreover, a significant change was observed in medial and lateral load, with SSS inducing a more symmetrical load distribution between the left and right feet compared to the NSS (SSS left medial load 57.1 +/- 2.1%, left lateral load 42.9 +/- 1.4%, right medial load 55.1 +/- 2.6%, right lateral load 44.9 +/- 2.6%; NSS left medial load 58.4 +/- 2.6%, left lateral load 41.6 +/- 2.1%, right medial load 49.2 +/- 3.7%, right lateral load 50.8 +/- 3.7%). The results of this case study suggest the importance of using individual evaluation methods to assess shoe adaptations in running athletes, which can induce biomechanical modifications and should be considered by coaches to ensure optimal running performance

    Cerebrospinal fluid levels of L-glutamate signal central inflammatory neurodegeneration in multiple sclerosis

    Get PDF
    Excessive extracellular concentrations of L-glutamate (L-Glu) can be neurotoxic and contribute to neurodegenerative processes in multiple sclerosis (MS). The association between cerebrospinal fluid (CSF) L-Glu levels, clinical features, and inflammatory biomarkers in patients with MS remains unclear. In 179 MS patients (relapsing remitting, RR, N = 157; secondary progressive/primary progressive, SP/PP, N = 22), CSF levels of L-Glu at diagnosis were determined and compared with those obtained in a group of 40 patients with non-inflammatory/non-degenerative disorders. Disability at the time of diagnosis, and after 1 year follow-up, was assessed using the Expanded Disability Status Scale (EDSS). CSF concentrations of lactate and of a large set of pro-inflammatory and anti-inflammatory molecules were explored. CSF levels of L-Glu were slightly reduced in MS patients compared to controls. In RR-MS patients, L-Glu levels correlated with EDSS after 1 year follow-up. Moreover, in MS patients, significant correlations were found between L-Glu and both CSF levels of lactate and the inflammatory molecules interleukin (IL)-2, IL-6, and IL-1 receptor antagonist. Altered expression of L-Glu is associated with disability progression, oxidative stress, and inflammation. These findings identify CSF L-Glu as a candidate neurochemical marker of inflammatory neurodegeneration in MS. (Figure presented.)

    Cerebrospinal fluid levels of L-glutamate signal central inflammatory neurodegeneration in multiple sclerosis

    Get PDF
    Excessive extracellular concentrations of L-glutamate (L-Glu) can be neurotoxic and contribute to neurodegenerative processes in multiple sclerosis (MS). The association between cerebrospinal fluid (CSF) L-Glu levels, clinical features, and inflammatory biomarkers in patients with MS remains unclear. In 179 MS patients (relapsing remitting, RR, N = 157; secondary progressive/primary progressive, SP/PP, N = 22), CSF levels of L-Glu at diagnosis were determined and compared with those obtained in a group of 40 patients with non-inflammatory/non-degenerative disorders. Disability at the time of diagnosis, and after 1 year follow-up, was assessed using the Expanded Disability Status Scale (EDSS). CSF concentrations of lactate and of a large set of pro-inflammatory and anti-inflammatory molecules were explored. CSF levels of L-Glu were slightly reduced in MS patients compared to controls. In RR-MS patients, L-Glu levels correlated with EDSS after 1 year follow-up. Moreover, in MS patients, significant correlations were found between L-Glu and both CSF levels of lactate and the inflammatory molecules interleukin (IL)-2, IL-6, and IL-1 receptor antagonist. Altered expression of L-Glu is associated with disability progression, oxidative stress, and inflammation. These findings identify CSF L-Glu as a candidate neurochemical marker of inflammatory neurodegeneration in MS. (Figure presented.)

    Parallel computation of echelon forms

    Get PDF
    International audienceWe propose efficient parallel algorithms and implementations on shared memory architectures of LU factorization over a finite field. Compared to the corresponding numerical routines, we have identified three main difficulties specific to linear algebra over finite fields. First, the arithmetic complexity could be dominated by modular reductions. Therefore, it is mandatory to delay as much as possible these reductions while mixing fine-grain parallelizations of tiled iterative and recursive algorithms. Second, fast linear algebra variants, e.g., using Strassen-Winograd algorithm, never suffer from instability and can thus be widely used in cascade with the classical algorithms. There, trade-offs are to be made between size of blocks well suited to those fast variants or to load and communication balancing. Third, many applications over finite fields require the rank profile of the matrix (quite often rank deficient) rather than the solution to a linear system. It is thus important to design parallel algorithms that preserve and compute this rank profile. Moreover, as the rank profile is only discovered during the algorithm, block size has then to be dynamic. We propose and compare several block decomposition: tile iterative with left-looking, right-looking and Crout variants, slab and tile recursive. Experiments demonstrate that the tile recursive variant performs better and matches the performance of reference numerical software when no rank deficiency occur. Furthermore, even in the most heterogeneous case, namely when all pivot blocks are rank deficient, we show that it is possbile to maintain a high efficiency

    GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement

    Get PDF
    In hardware-aware high performance computing, block-asynchronous iteration and mixed precision iterative refinement are two techniques that may be used to leverage the computing power of SIMD accelerators like GPUs in the iterative solution of linear equation systems. although they use a very different approach for this purpose, they share the basic idea of compensating the convergence properties of an inferior numerical algorithm by a more efficient usage of the provided computing power. In this paper, we analyze the potential of combining both techniques. Therefore, we derive a mixed precision iterative refinement algorithm using a block-asynchronous iteration as an error correction solver, and compare its performance with a pure implementation of a block-asynchronous iteration and an iterative refinement method using double precision for the error correction solver. For matrices from the University of Florida Matrix collection, we report the convergence behaviour and provide the total solver runtime using different GPU architectures

    INTEGRAL/SPI data segmentation to retrieve sources intensity variations

    Get PDF
    International audienceContext. The INTEGRAL/SPI, X/γ-ray spectrometer (20 keV–8 MeV) is an instrument for which recovering source intensity variations is not straightforward and can constitute a difficulty for data analysis. In most cases, determining the source intensity changes between exposures is largely based on a priori information.Aims. We propose techniques that help to overcome the difficulty related to source intensity variations, which make this step more rational. In addition, the constructed “synthetic” light curves should permit us to obtain a sky model that describes the data better and optimizes the source signal-to-noise ratios.Methods. For this purpose, the time intensity variation of each source was modeled as a combination of piecewise segments of time during which a given source exhibits a constant intensity. To optimize the signal-to-noise ratios, the number of segments was minimized. We present a first method that takes advantage of previous time series that can be obtained from another instrument on-board the INTEGRAL observatory. A data segmentation algorithm was then used to synthesize the time series into segments. The second method no longer needs external light curves, but solely SPI raw data. For this, we developed a specific algorithm that involves the SPI transfer function.Results. The time segmentation algorithms that were developed solve a difficulty inherent to the SPI instrument, which is the intensity variations of sources between exposures, and it allows us to obtain more information about the sources’ behavior
    • …
    corecore